Search CORE

8 research outputs found

Fourier Based Fast Multipole Method for the Helmholtz Equation

Author: Cecka Cris
Darve Eric
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 25/10/2011
Field of study

The fast multipole method (FMM) has had great success in reducing the computational complexity of solving the boundary integral form of the Helmholtz equation. We present a formulation of the Helmholtz FMM that uses Fourier basis functions rather than spherical harmonics. By modifying the transfer function in the precomputation stage of the FMM, time-critical stages of the algorithm are accelerated by causing the interpolation operators to become straightforward applications of fast Fourier transforms, retaining the diagonality of the transfer function, and providing a simplified error analysis. Using Fourier analysis, constructive algorithms are derived to a priori determine an integration quadrature for a given error tolerance. Sharp error bounds are derived and verified numerically. Various optimizations are considered to reduce the number of quadrature points and reduce the cost of computing the transfer function.Comment: 24 pages, 13 figure

arXiv.org e-Print Archive

Crossref

The fast multipole method on parallel clusters, multicore processors, and graphics processing units

Author: Cris Cecka
Eric Darve
Toru Takahashi
Publication venue
Publication date: 01/01/2011
Field of study

Comptes Rendus Mécanique (CRM)

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Author: Anandkumar Animashree
Cecka Cris
Niranjan U. N.
Shi Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. In this paper, we propose and evaluate new BLAS-like primitives that are capable of performing a wide range of tensor contractions on CPU and GPU efficiently. We begin by focusing on single-index contractions involving all the possible configurations of second-order and third-order tensors. Then, we discuss extensions to more general cases. Existing approaches for tensor contractions spend large amounts of time restructuring the data which typically involves explicit copy and transpose operations. In this work, we summarize existing approaches and present library-based approaches that avoid memory movement. Through systematic benchmarking, we demonstrate that our approach can achieve 10x speedup on a K40c GPU and 2x speedup on dual-socket Haswell-EP CPUs, using MKL and CUBLAS respectively, for small and moderate tensor sizes. This is relevant in many machine learning applications such as deep learning, where tensor sizes tend to be small, but require numerous tensor contraction operations to be performed successively. Concretely, we implement a Tucker decomposition and show that using our kernels yields atleast an order of magnitude speedup as compared to state-of-the-art libraries

Recommended from our members

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Author: Anandkumar Animashree
Cecka Cris
Niranjan UN
Shi Yang
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. Existing approaches for tensor contractions typically involves explicit copy and transpose operations. In this paper, we propose and evaluate a new BLAS-like primitive STRIDEDBATCHEDGEMM that is capable of performing a wide range of tensor contractions on CPU and GPU efficiently. Through systematic benchmarking, we demonstrate the advantages of our approach over conventional approaches. Concretely, we implement the Tucker decomposition and show that using our kernels yields 100x speedup as compared to the implementation using existing state-of-the-art libraries

eScholarship - University of California

Industrialisierung und Strukturkrise zur oekonomischen Transformation Spaniens in der Liberalisierungsperiode

Author: Anandkumar Animashree
Cecka Cris
IEEE
Niranjan UN
Shi Yang
Publication venue
Publication date: 01/01/1984
Field of study

UuStB Koeln(38)-8Y8589 / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEDEGerman

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Caltech Authors

OpenGrey Repository